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DISTANCE-BASED CLUSTERING OF SPARSELY OBSERVED 
STOCHASTIC PROCESSES, WITH APPLICATIONS TO 
ONLINE AUCTIONS 

By Jie Peng and Hans-Georg Muller 1 

University of California, Davis 

We propose a distance between two realizations of a random pro- 
cess where for each realization only sparse and irregularly spaced 
measurements with additional measurement errors are available. Such 
data occur commonly in longitudinal studies and online trading data. 
A distance measure then makes it possible to apply distance-based 
analysis such as classification, clustering and multidimensional scal- 
ing for irregularly sampled longitudinal data. Once a suitable dis- 
tance measure for sparsely sampled longitudinal trajectories has been 
found, we apply distance-based clustering methods to eBay online 
auction data. We identify six distinct clusters of bidding patterns. 
Each of these bidding patterns is found to be associated with a spe- 
cific chance to obtain the auctioned item at a reasonable price. 

1. Introduction. The goal of cluster analysis is to group a collection 
of subjects into clusters, such that those falling into the same cluster are 
more similar to each other than those in different clusters. Therefore, a 
measure of similarity or dissimilarity between subjects is a necessary in- 
gredient for clustering. A metric defined on the subject space is one way 
to obtain dissimilarities, simply using the distance between two subjects 
as a measure of dissimilarity. While one can readily choose from a vari- 
ety of well-known metrics for the case of classical multivariate data, or for 
functional data that are in the form of continuously observed trajectories, 
finding a suitable distance measure for irregularly observed data can be a 
challenge. One such situation which we study here occurs in the commonly 
encountered case of irregularly and sparsely observed longitudinal data, 
with online auction data a prominent example [Shmueli and Jank (2005), 
Jank and Shmueli (2006), Shmueli, Russo and Jank (2007), Liu and Muller 
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(2008)]. As an example, a snapshot of an eBay auction history for a Palm 
Personal Digital Assistant is shown in Figure 1. In this paper the focus is 
on a traditional clustering framework, where it is assumed that each sub- 
ject belongs to exactly one cluster. There are alternative clustering ideas 
such as soft clustering [Erosheva and Fienberg (2005)] or mixed member- 
ship clustering [Erosheva, Fienberg and Lafferty (2004)]. For example, in 
Erosheva, Fienberg and Joutard (2007), functional disability data are ana- 
lyzed by a grade of membership model, which allows subjects to have partial 
membership in several mixture components at the same time. 

Online auctions are generating increasingly large amounts of data for 
which analysis tools are still scarce and eBay is one of the biggest online 
auction marketplaces. The eBay auction shown in Figure 1 is an example of 
the type of single- item auctions on which we focus. These are 7-day auctions 
set up as second-price auctions. eBay uses a proxy bidding system where 
bidders submit the maximum amounts that they are willing to pay for the 
item being auctioned, referred to as WTP — willingness to pay amounts, and 
the proxy system automatically increases each bidder's bid by a minimum 
increment (which is relative to the current highest bid and set by eBay) , until 
either the bidder's maximum has been reached, or the bidder has the current 
highest bid. During an auction, a bidder can submit as many WTP amounts 
as desired. The winning bidder is determined according to who has submitted 
the highest bid at the end of the auction. The price the winning bidder pays 
corresponds to the second highest bid, plus an increment [Shmueli and Jank 
(2005)]. We refer to the series of all WTP bids, including the times within the 
auction at which these were submitted, as the "bid history" of a particular 
auction. It is noteworthy that consecutive WTP amounts can decrease and 
therefore are not constrained to be monotone increasing, since all but the 
highest current WTP amounts are visible during an ongoing auction. 

For studying bidding behaviors in eBay auctions, we will focus here on 
the WTP amounts, as these reflect the intentions of bidders and therefore 
capture bidder behavior. A characteristic of online auction data is that the 
times when bids are placed are sparsely located within the time domain of 
an auction (7 days in our examples), as many bidders submit only very few 
bids (one or two) during a given auction, and the timing of their bids is 
irregular. It is well known that early and late phases in an auction attract 
more bidding activity than the middle phase. The often frenzied bidding 
activity at the end of an auction is referred to as bid sniping, and is caused 
by the desire of bidders to win a given auction. Besides the objective of 
winning, a main goal for most bidders is to win the item being auctioned at 
the lowest possible price. 

The set of bids placed by an individual bidder during a specific auction 
consists of a few snapshots, taken at the times the bidder places a bid, of an 
underlying bidder trajectory which is a continuous function that corresponds 
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Fig. 1. Snapshot of a seven-day eBay auction. 



to a specific realization of a stochastic process and reflects the bidding be- 
havior. Our study is motivated by the goal to classify bidder trajectories 
based on the observed bidding activities. Classifying bidder trajectories is 
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of interest in order to identify different bidding strategies. Bidders aim at 
achieving a high chance of winning and/or winning the item at a low final 
price. Studying how different strategies connect to these somewhat con- 
flicting aims may help to differentiate various strategies in regard to their 
effectiveness to achieve these aims. As pointed out in Bapna et al. (2004), 
learning about bidding strategies also serves to enhance the design of online 
auction systems. In a recent paper by Jank and Shmueli (2008), the authors 
take a functional data view point and use curve clustering techniques to 
group the price trajectories of auctions. Their goal is to characterize het- 
erogeneity in the price formation process of online auctions and understand 
sources that affect the price dynamics. Therefore, the trajectories of inter- 
est in their study are derived from the ensemble of all bids placed by all 
bidders who participate in a certain auction, while we are interested in the 
study of individual bidder-specific trajectories and the clustering of these 
trajectories. 

A first step to differentiate between various bidding strategies is to define 
a distance between the various observed bidding behaviors, in order to derive 
a dissimilarity measure between different bidding patterns. The distance is 
to be based on the observed WTP bids for one bidder (in one auction), and 
as typical bidder trajectories are observed at only very few and irregular 
times, this leads to the challenge to define a distance based on sparse and 
irregularly timed data. Similar problems also arise in many other types of 
data from online environments, such as user reviews and weblog postings, 
where a single user might contribute a small number of entries for a specific 
topic/item. Assuming that an individual's bidding activity is a reflection of 
the realization of an underlying stochastic bid price process, this challenge 
motivates the development of a metric on the sample space of a stochas- 
tic process, where elements of this space are realizations of the underlying 
process with observations that consist of noisy measurements and are made 
at sparse and irregular time points. Once we have constructed a reasonable 
distance, we may base clustering methods on the resulting distance matrix, 
for example, one may apply multidimensional scaling or similar approaches. 
Implementing such a procedure, we find six distinct clusters of bidding pat- 
terns by analyzing the bids submitted during 158 seven-day auctions of Palm 
M515 Personal Digital Assistants (for more details, see Section 3). Interest- 
ingly, the chance of obtaining the auctioned item at a low price is closely 
associated with the bidding pattern/strategy: If the goal of the bidder is to 
win the auctioned item at a reasonable price, the resulting probabilities to 
achieve this goal show clear differences for the various bidding strategies, 
and one can identify better and worse strategies. 

If the entire trajectory of each realized bid price process were observed, 
then the L<i norm in the space of square integrable functions would provide 
a natural starting point for defining a metric. However, the L2 distance is 
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not readily calculable from the actually available noisy, sparse and irregu- 
larly sampled measurements of the bid price process. Suppose one observes 
a square integrable stochastic process {X(t):t 6 T} at a random number 
of randomly located points in T, with measurements corrupted by additive 
i.i.d. random noise. The observations available from n independent realiza- 
tions of the process are {Yn : 1 < I < nf, 1 < i < n} with 

(1.1) Y a = X i (Tu)+e i i, 

where {en} are i.i.d. with mean and variance a 2 . Since X is a square inte- 
grable stochastic process, by Mercer's theorem [cf. Ash (1972)], there exists 
a positive semidefinite kernel C(-, •) such that cov(X(s), X(t)) = C(s,t) and 
we have the following expansion of the process X{(t) in terms of the eigen- 
functions of the kernel C(-,-): 

oo 

(1-2) *i(t)=M*) + E&k0k(*), 

k=l 

where //(•) = E(X(-)) is the mean function; the random variables : k > 1} 
for each i are uncorrelated with zero mean and variance Y^k=i < °°i 
Ai > A2 > • • • > are the eigenvalues of C(-, •); and 4>k{-) are the correspond- 
ing orthonormal eigenfunctions. 

In the observed data model we assume that {Tn : I = 1, . . . , rjj; 1 < i < n} 
are randomly sampled from a (possibly unknown) distribution with a den- 
sity g on T. In the problems we study, rij are typically small, reflecting that 
the observed data consist of sparse and noisy realizations of a stochastic 
process. We will define a distance between two such realizations Xi and Xj 
based on the observed data as described in the next section. This approach 
is inspired by recent developments of functional data analysis methodology 
for longitudinal data, notably the work of Yao, Miiller and Wang (2005) 
where trajectories are predicted from sparse and noisy observations, which 
recently was adapted to online auctions [Liu and Miiller (2008)]. Approaches 
based on B-spline fitting with random coefficients which are suitable to 
fit similar data with random coefficient models have been proposed by 
Shi, Weiss and Taylor (1996), Rice and Wu (2001), and recently in the con- 
text of online auctions by Reithinger et al. (2008). 

For an up-to-date introduction to functional data analysis, we refer to the 
excellent book by Ramsay and Silverman (2005). Descriptions of the rapidly 
evolving interface between longitudinal and functional methodology and 
functional models for sparse longitudinal data can be found in Rice and Wu 
(2001), James, Hastie and Sugar (2001), James and Sugar (2003), and the 
overviews provided in Rice (2004), Zhao, Marron and Wells (2004) and Miiller 
(2005). The proposed distance is introduced in the following section. An ap- 
plication to the clustering of bidding patterns in eBay online auctions is 
the topic of Section 3, followed by concluding remarks. Proofs and auxiliary 
remarks can be found in an Appendix. 
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2. A distance for sparse and irregular data. We propose a distance be- 
tween the random curves Xi and Xj based on the observed data Yj = 
(Yn, . . . , Yi ni ) T and Yj = (Yn, . . . ,Yj nj ) . The idea is to use the condi- 
tional expectation of the L2 distance between these two curves, given the 
data. Our analysis is conditional on the times of the measurements {Tn : I = 
1, . . . , n,; 1 < i < n} and their numbers {n,, : 1 < i < n}. 



2.1. Definition and basic properties. The L2 distance between two curves 

1/2 



Xi and Xj is defined as 



D(i,j) = {j r (X i (t)-X j (t)) 2 dt 

and is not calculable in our situation, as only the sparse data Yj and Yj 
are observed. Therefore, we propose to use the conditional expectation of 
given Yj and Yj, as the squared distance between Xi and Xj, 

(2.1) D(i,j) = {E(D 2 (i,j)\Y i ,Y j )} 1 / 2 , l<i,j<n. 

Note that as a function of Yj,Yj, the D(i,j)s are random variables and 
have the following properties, the proof of which is given in the Appendix. 

Proposition 2.1. D satisfies the following properties: 

1. D(i,j)>0, D(i,i) = andfori^j, P(D(i,j)>0) = l; 

2. D(i,j) = D(j,i); 

3. For l<i,j,k< n, D(i,j) < D(i,k) + D(k,j). 

Therefore, D can be viewed as a metric on the subject space consisting of 
random realizations {A",(-)} of the underlying stochastic process X(-). Since 
under model (1.2), Parzeval's identity implies that the L2 distance between 
Xi and Xj can be written as 

(- 00 1 1/2 

D(i,j) = \\Xi - Xj\\ 2 = - i jk ? , 



.fc=l 



we get 



D\i,j) = E yrj(£ 4fc - £jfc) 2 |Yi, YjJ . 
For an integer K > 1, we then define truncated versions of D as 

(2.2) ={Efv(e ifc -^ fc ) 2 |Y,YjU 
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r K 1 1/2 

= |E varfcfclYi) + var(£ jfe |Y i ) + (E(£ ifc | Y) - E(£ jfc | Y)) 2 j . 

Note that it follows from these definitions that ~E(D 2 (i, j)) = ~E(D 2 (i,j)) and 
also for the truncated versions E(D^ K \i,j) 2 ) = YjL\ 2X k = E(D m (*,j) 2 ), 
so that these conditional expectations are unbiased predictors of the corre- 
sponding squared L2 distances. 

2.2. Estimation. In the following we discuss the estimation of the trun- 
cated version of the distance D (K \i,j) (2.2). Given an integer K > 1, let 
A^' = diag{Ai, . . . , Xk} be the K x K diagonal matrix with diagonal ele- 
ments {Ai, . . . , A x }. For 1 < i < n, 1 < k < K, let ^ = (fJ,(T a ), . . . ,/j(T in J) T , 

# } = fei, • • • , HiK) T , <t> ik = (h(Ta), faiT^f and $W = (0 a , . . . , <t> iK ). 
Define 

(2-3) r ) =AW(d>W)^-(Y,- / xJ, 

where Y> Y% = cov(Y;, Yj) = (C{T iU T iV )) + a 2 I ni . Note that is the best 
linear unbiased predictor (BLUP) of $} K \ since cov(£ w , Y) = A i - K \^ { j K) ) T . 
Moreover, if we have a finite-dimensional process, such that for some integer 
K > 0, Afc = for k > K in model (1.2), then (omitting upper subscripts K) 
£y t = $ 4 A($ 4 ) T + c7 2 I n! and A$f (^ASf + a 2 I„ l )- 1 = ($f ^j + ^A" 1 )- 1 ^ , 
so that 

e i = (*f* i + a a A- 1 )- 1 *f(Y i -Ai i ) > 
which also is the solution of the penalized least-squares problem 

K 

min^Y, - /x, - <^) T (Y, - /z, - $,£) + a 2 £ £ 2 /A fc . 

fc=i 

If one assumes normality of the processes in models (1.1) and (1.2), that 
is, iik ~ ^V(0, Afc) and £j/ * ~ d ' N(0, a 2 ) and independence between errors and 
processes, the joint distribution of {Yj, } is multivariate normal with 

Normal (f>Af ,3<U„ ^'Y 



(2.4) US'^~ NOTm nroVAA™(#')f. AW 
Therefore, the conditional distribution of ^\ K ^ given Yj is normal with mean 



(2.5) E«W|Y,) = A^($f ) ) T E^ 1 (Y i - M J = £ 
and variance 

(2.6) varC^lY,) = A« - A<*>(*j* >f Ef 1 *^. 
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Furthermore, £ becomes the best predictor of £^ , and with (2.5) and (2.6), 

(^(i,j)) J 

= tr(AW - A^($f } ) T S^$f >A(*>) 

(2.7) 

+ tr(AW-AW($f ) ) T S^ 1 $f ) AW) 

Therefore, D^ K \i,j) can then be estimated by plugging in estimates for the 
model components, that is, for mean curve /i(-), covariance kernel C(-, •), first 
K eigenvalues {A/% :k = l,..., K} and corresponding eigenfunctions {<pk '■ k = 
1, . . . ,K}, as well as error variance a 2 . Although (2.7) is derived under the 
normality assumption, its expectation is always equal to the expectation of 
D^ K \i,j) 2 (which is J2k=i^^k), regardless of distributional assumptions. 

Assuming that mean, covariance and eigenfunctions are all smooth, one 
can apply local linear smoothers [Fan and Gijbels (1996)], pooling observa- 
tions for function and surface estimation, fitting local lines in one dimension 
for estimating the mean function and local planes in two dimensions for 
estimation of the covariance kernel [see Yao, Miiller and Wang (2005) for 
details]. Denoting the resulting estimates of //(•), C(-, •) by ju(-), C (-,•), the 
estimates of eigenfunctions and eigenvalues are given by the solutions (frk 
and Xk of the eigen-equations 

C(s, t)4> k (s) ds = A fc ^ fc (t), 

where this system of equations is solved by discretizing the smoothed covari- 
ance [Rice and Silverman (1991)], followed by a projection on the space of 
symmetric and nonnegative definite covariance surfaces [Yao et al. (2003)]. 
The estimate a 2 of a 2 is obtained by first subtracting C(t,t) from a local 
linear smoother of C(t,t) + <r 2 , denoted by V(t), then averaging over a sub- 
set of T [Yao, Miiller and Wang (2005)]. Further details can be found in the 
Appendix. 

The estimate of D^ K \i,j) is then given by 

£>W(i,j) = {tr(AW - IW^WfS-iQVOftK)) 
+ tr(AW-AW(8f)) T E^ 1 8f ) AW) 

(2 ' 8) +m K H^ K V^i-n i ) 

-AW(Sf)fE^(Y,-M,)lll} 1/2 5 
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where A^' = diag{Ai, . . . , Xr}', $j = (<t>a, ■ ■ ■ , <t>iK)-> an d the (1,1') entry 
of Sy ; is (SyJ/^' = C (Tn, Tu') + <? 2 <5zz'- The computer code to calculate (2.8) 
based on data is given as Supplementary material [Peng and Miiller (2008)]. 
To obtain functional principal component scores for sparse data, one can also 
use the matlab package PACE (http://anson.ucdavis.edu/~mueller/data/ 
programs.html) . 

The following result shows the consistency of these estimates for the target 
distance D(i,j), providing some assurance that the estimated distance is 
close to the targeted one if enough components are included and the number 
of observed random curves is large enough. 

Theorem 2.1. Under the assumptions listed in Lemma A. 2 in the Appendix, 



Proof. See Appendix. □ 

2.3. Distance-based scaling. Multidimensional scaling (MDS) aims to find 
a projection of given original objects for which one has a distance matrix 
into p-dimensional (Euclidean) space for any p > 1 , often chosen as p = 2 
or 3 which provides best visualization. The projected points in p-space rep- 
resent the original objects (e.g., random curves) in such a way that their 
distances match with the original distances or dissimilarities {<%}, accord- 
ing to some target criterion. In our setting these original distances will be 
the estimated conditional Li distances (2.8) between the sparsely observed 
random trajectories. Various techniques exist for implementing the MDS 
projection, including metric and nonmetric scaling. 

In classical metric scaling one treats dissimilarities {<%} directly as Eu- 
clidean distances and then uses the spectral decomposition of a doubly cen- 
tered matrix of dissimilarities [Cox and Cox (2001)]. It is well known that 
there is an equivalence between principal components analysis and classical 
scaling when dissimilarities are truly Euclidean distances (if the subjects 
are points in an Euclidean space). Metric least squares scaling finds config- 
uration points {xi} in a p-dimensional space with distances {(%} matching 
{Sij} as closely as possible, by minimizing a loss function S, for example, 
S = J2i<j $ij (dij — 5ij) 2 1 J2i<j $ij [Sammon (1969)]. Two other popular op- 
timally criteria for metric MDS are metric stress and s-stress, which 
are special cases of the criterion 



lim lim D^ K \ 




in probability. 



minimize 



£^;[(4) r -(4)T> 



i<j 



usually implemented with Wij = 1. The stress criterion corresponds to the 
case r = 1/2 and was originally proposed by Kruskal (1964) for nonmetric 
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MDS, while the s-stress criterion corresponds to r = 1 and was popularized 
by Takane, Young and DeLeeuw (1977). 

The s-stress criterion leads to a smooth minimization problem in con- 
trast to stress. Kearsley, Tapia and Trosset (1998) applied Newton's method 
to find solutions using these criteria. In practice, the stress criterion is of- 
ten normalized by the sum of squares of the dissimilarities, thus becoming 
scale free. Similarly, the s-stress criterion is normalized with the sum of 
the 4th powers of the dissimilarities. In the following implementation of our 
approach, we use metric MDS with the s-stress criterion for visualizing 
sparse and irregularly observed longitudinal data, where the MDS input 
distances are the estimates D (2.8) of proposed distances D (2.7). 

3. Clustering bidders in online auctions. Most of the eBay auctions are 
second-price closed-ended auctions with proxy bidding, as explained in the 
Introduction. In such auctions bidders submit the maximum amount that 
they are willing-to-pay (WTP) for the item. If in an ongoing auction a bidder 
submits a WTP amount that is higher than all previously submitted WTP 
amounts plus a minimum increment, then the bidder becomes the leading 
bidder. The winner of the auction is the leading bidder at the time the 
auction closes. The price that the winning bidder pays for the item is the 
second highest WTP plus the increment amount (second price), where the 
increments depend on the price level already reached. The duration of the 
auction is pre-determined (seven days for the auctions we consider here), 
and a bidder can place arbitrarily many bids at any WTP level and at any 
time while the auction is in progress [Shmueli and Jank (2005)]. During an 
auction in progress, all except the highest WTP amount are disclosed at 
current time, so that the "current price" observed for that auction at any 
given time is the second highest WTP value plus an increment. The first bid 
(opening bid) in an auction is set by the seller at the start of the auction 
and is the initial required bid amount. 

We study classification of bidding behaviors using the proposed distance 
measure with eBay online auction data for 158 seven day auctions of Palm 
M515 Personal Digital Assistants (PDA) that took place between March and 
May, 2003. This data is publicly available at http:/ /www. rhsmith.umd.edu/ 
ceme/statistics/data.html. One auction which did not contain bidder ID 
information was removed, and the analysis reported here is based on the 157 
remaining auctions. These data contain recorded bids and their respective 
times (where time origin is always the beginning of an auction) as well as 
WTP amounts for 1122 distinct bidders, 1818 different bidder trajectories 
(corresponding to the bids submitted by one bidder in one auction), for 
a total of 3643 submitted bids (WTP amounts). One bidder can generate 
a bidder trajectory in several different auctions and one auction usually 
consists of several bidder trajectories by several different bidders. 
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Among the 1818 observed bidder trajectories, 1046 consist of only one 
bid, and 361 of two bids. The average number of bids in a bidder trajectory 
is 2 with a standard deviation of 1.82. It is quite typical in eBay auctions 
that most bidders submit only very few bids. This results in the sparseness 
of observable bids corresponding to one bidder trajectory, and furthermore, 
the bids are irregularly placed during an ongoing auction. Individual bidder 
trajectories are shown in Figure 2, where the observed WTP amounts are 
simply connected by straight lines. The ith bidder trajectory consists of the 
series of bid times Ty, j = 1, . . . , rij (measured in hours, where the origin of 
time corresponds to the beginning of the auction), and the corresponding 
WTP values Yij (measured in dollars). The bidder trajectory is assumed to 
be generated by a latent continuous random trajectory of which the observed 
bids are just a snapshot. Given these bidder trajectories for the n = 1122 
bidders, we aim at an empirical classification of the bidding strategies that 
bidders employ. 

The mean trajectory for the observed 1818 bidder trajectories is obtained 
by pooling all data as described in the Appendix. In a preprocessing step, 
using the pooled data, the residuals from the overall mean curve are calcu- 
lated, and the standard deviation calculated from all such residuals is then 
used to remove bidder trajectories that contain outlying bids. These are 
defined as bids that fall outside of three standard deviations of the mean 
curve. We found 47 outlying bids in 17 of the bidder trajectories which were 
removed, leaving n = 1801 bidder trajectories from 1112 distinct bidders and 
all 157 auctions in the analysis. The total remaining number of bids is 3596. 
The outliers were removed to assure more robust estimation of mean curve 
and covariance surface. As there are only 17 bidder trajectories removed, 
the final clustering results are not much affected by removing the outliers. 
The range of bid times is from O.I8/1 to 168/t, where the time domain of 
each auction is [0, 168/t] (i.e., seven-day auctions). The data are then mod- 
eled as generated from 1801 realizations of an underlying latent stochastic 
bid price process, as described in Section 1. The mean trajectory (Figure 2) 
and covariance surface of this bid price process are estimated as described 
in the Appendix, with bandwidths selected by leave-one-curve-out cross val- 
idation. Even though bidder trajectories from the same auction or from the 
same bidder are most likely dependent to some extent, we do not expect the 
mild dependence to have much impact on the estimation procedures, and 
previous analyses such as the one reported in Bapna et al. (2004) also have 
used all available bidder trajectories. 

We chose K = 5 components when estimating the conditional L2 distance 
D = (-D(i, j))i<ij<i80i [see equation (2.1)], based on the fact that these ac- 
counted for about 95% of total variation of the random trajectories. The 
estimated first five eigenvalues are 191.4 x 10 3 , 30.75 x 10 3 , 5.382 x 10 3 , 
4.183 x 10 3 and 2.111 x 10 3 , explaining 78.6%, 12.6%, 2.21%, 1.72%, 0.87% 
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Fig. 2. Individual bidder trajectories and mean bidder trajectory (black solid line) for 
1801 bidder trajectories in 157 eBay online auctions. Individual trajectories are colored 
according to their cluster membership (see Figure 3 for the correspondence between color 
and cluster). 

of the sum of the first 100 eigenvalues, respectively. The estimated error 
variance is 697, where all amounts are in U.S. dollars. Multidimensional 
scaling (MDS) was then applied to the distance matrix D, projecting into 
a space of dimension p = 2, and using the matlab function mdscale. For 
the goodness of fit criterion that the MDS algorithm minimizes, we consid- 
ered the criteria sammon, metricstress and metricsstress. Among these, 
only metricsstress converged for the data at hand. The MDS projection 
result is displayed in Figure 3. This figure clearly reveals that the bidder 
trajectories can be separated into distinct subgroups. 

Applying K-means cluster analysis to the metricsstress MDS results, 
six clusters of bidder trajectories can be identified (Figure 3). We label these 
clusters as "L" (Low end), "H" (High end), "S" (Slow start), "F" (Fast start, 
slow increase), "A" (Aggressive) and "E" (Early), respectively. Each cluster 
consists of a number of bidder trajectories ranging between 212 and 399. 
One finds that bidders with bidder trajectories in clusters "L", "H" and "E" 
tend to place relatively fewer bids than those in clusters "S" and "A" , with 
those in cluster "F" placing a moderate number of bids (Table 1). Further 
characteristics of these six clusters regarding bid times, bid amounts and 
chances of winning are summarized in Table 1. 

Figures 4 and 5 display the line connected individual bidder trajectories 
and the fitted mean trajectories for the six clusters of bidder trajectories. 



CLUSTERING OF STOCHASTIC PROCESSES 



13 



8 -f 




7 ~* 1 1 i 1 <— 

-low -600 o 600 iota 

dim 1 



Fig. 3. Multidimensional scaling (MDS) using the criterion metricsstress applied to 
the proposed conditional functional distance and projecting to two-dimensional space. Also 
shown is the K-means clustering result which reveals six clusters: "L" (Low end), "H" 
(High end), "S" (Slow start), "F" (Fast start, slow increase), "A" (Aggressive) and "E" 
(Early). 



Table 1 

Summary statistics for the six clusters 



Group 
ID 


Number Average 
of bidder number of 

trajectories bids (std) 


Bid times Bid amounts 
{Qi,m,Q 3 ) (Qi,m,Q 3 ) 


Number of Average amount 
winners paid at 

(%) winning (std) 


"L" 


386 


1.80 


(132, 158, 166) (125, 175, 200) 


29 


208 






(1.60) 




(7.51%) 


(10.8) 


"H" 


399 


1.76 


(155, 165, 168) (200, 220, 233) 


97 


237 






(1.36) 




(24.3%) 


(11.2) 


"S" 


298 


2.51 


(83, 124, 152) (45, 77, 122) 


2 


177 






(2.39) 




(0.67%) 


(10.4) 


"F" 


212 


1.97 


(22, 58, 110) (80, 101, 140) 





NA 






(1.93) 




(0%) 


(NA) 


"A" 


285 


2.19 


(75, 123, 149) (157, 187, 215) 


26 


245 






(1.65) 




(9.12%) 


(16.5) 


"E" 


221 


1.85 


(8.4, 22, 54) (13, 35, 61) 


1 


230 






(1.94) 




(0.45%) 


(NA) 


Overall 


1801 


2 


(65, 136, 162) (77, 150, 200) 


155 


232 






(1.82) 




(8.6%) 


(18.3) 



* {Qi,m,Qz) stands for (first quartile, median, third quartile). 
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Fig. 4. Individual bidder trajectories for each of the six clusters. 



The histograms of the bid time and bid amount for each cluster are visualized 
in Figures 6 and 7. One finds that "L" and "H" bidders tend to place bids 
over a short period of time near the end of an auction, and participate in 
"bid sniping." As can be seen from Figure 7, near the end of an auction, "L" 
bidders place relatively low bids, so these bidders aim at a good bargain, 
while "H" bidders place relatively high bids, so the primary interest of these 
bidders seems to be to secure the item, while price plays a secondary role. In 
contrast to "L" and "H" bidders, "E" bidders tend to bid at the beginning 
of an auction, placing low bids, and then refrain from placing subsequent 
competitive bids. "S", "F" and "A" bidders tend to place bids throughout 
the entire auction. Among these, "S" bidders tend to slowly raise their bids 
at the beginning and then increase them faster toward the end of the auction, 
giving rise to a convex mean curve. "F" bidders, on the other hand, place 
relatively high bids at the beginning (mostly starting around 50 dollars), 
and from then on only cautiously raise their bids, ending their bidding at 
a fairly modest price level. Their mean bidder trajectory is almost linear. 
While "A" bidders also start with high bids at the beginning, they increase 
their bids more aggressively compared to the "F" bidders throughout the 
auction and end up at a high price level. 



CLUSTERING OF STOCHASTIC PROCESSES 



15 



Cluster L 



Duster H 



| I' 

E 

2 * ■ 




Time alter auction started {in hours) 
Cluster 5 




Time after auction started {in hours) 
Cluster A 




I 

•5 




Time alter auction slarled (in hours) 
Cluster F 



f 




Time aher auction slaned fin hours) 
Cluster E 


5I« 





Time after auction started {in hours) 



Time alter auction slaned (in hours) 



Fig. 5. Mean bidder trajectory (black solid line) for each of the six clusters. 



It is instructive to study the winning rate for each cluster. The winning 
rate for a given cluster is defined as the proportion of the winning trajecto- 
ries, that is, the fraction of the bidders in the cluster winning the auction. 
We find that "H" bidders have the highest winning rate: around 24.3%; this 
is not too surprising as these bidders tend to place high bids near the end of 
an auction. The success rates of "A" bidders and "L" bidders are lower and 
comparable to each other, 9.1% and 7.5%, respectively. The chance of win- 
ning the auction is very slim for the remaining three groups. Indeed, "S" and 
"E" groups only include 2 and 1 winners respectively, and there is no winner 
in the "F" group. By examining the average prices paid when winning the 
bid, "A" bidders pay highest with 245$, while "H" bidders pay slightly less 
with 237$. "L" bidders pay considerably less with 208$, while "E" bidders 
and "S" bidders pay 230$ and 177$, respectively. For more details, see Table 
1. 

Our results confirm the effectiveness of the "bid sniping" strategy in eBay 
online auctions. If the goal is to secure the item, then "H" emerges as the 
best strategy. If, however, the goal is to win the item with a low price and 
a reasonable chance, then "L" is the best strategy. Since both strategies 
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Fig. 6. Histogram of times when bids are placed for each of the six clusters. 



involve a fewer number of bids, they are also best in terms of the efforts 
invested in an auction by a bidder. 

4. Discussion. Bapna et al. (2004) proposed to use clustering of bidders 
based on three summary statistics: time of entry, time of exit and number 
of bids. It is well known that the beginning and ending of an online auction 
usually attract more bids than the middle period and these three statistics 
can be expected to partially reflect the timing of the bidding. Clustering 
based on these three statistics has been applied to multiunit Yankee auctions 
which use a format that is quite different from the format used in eBay 
auctions. These statistics have been shown to lead to sensible and insightful 
clustering results for this format. However, these statistics do not reflect 
the actual bid amounts placed during an auction, and one can expect that 
including the amounts adds valuable information that leads to improved 
clustering. 

In fact, the six bidding strategies we identified are not only differentiated 
by the timing when the bids are submitted, but even more so by the bid 
amounts and especially the interaction between the two, namely, at which 
time when lower, when higher and when no bids at all are submitted. These 
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Fig. 7. Histogram of bid (willingness-to-pay) amounts for each of the six clusters. 



interactions carry important information that characterizes bidding strate- 
gies. For example, when ignoring the actual bid amounts, one could not 
distinguish between "L" and "H" bidders, as they show quite similar pat- 
terns in terms of timing when bids are placed, but differ substantially in 
the amounts of the bids placed. The functional viewpoint that we advocate 
reveals more features of a particular strategy, as it allows a natural descrip- 
tion of the interactions between the timing of bids and their amounts, for 
example, in the characterization of the "S", "A" and "F" clusters. 

The proposed functional distance is defined conditionally on observed 
"snapshots" of the underlying trajectories. It provides a useful tool and 
solves the problem to defining a distance when the data are sparse and 
irregular. After the distances for such data have been obtained, one can apply 
standard tools such as MDS and clustering methods to classify such data. 
We demonstrated that the estimated distances converge to target values 
under suitable assumptions. 

Applying these methods to online auction data leads to insights on bid- 
ding strategies. For the second-price closed-ended eBay online auction data 
that we have considered, bidding behaviors fall into six distinct clusters, each 
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associated with specific characteristics of the corresponding bidder trajec- 
tories. Each of these bidding strategies is associated with a distinct chance 
to win the item that is auctioned, and also with a distinct final price that 
the winner of the auction has to pay for the item. It turns out that the six 
bidding strategies can be clearly distinguished in terms of how well they 
achieve these goals. The strategy of placing bids near the end of an auc- 
tion at moderate bid levels ("L" cluster) is most effective when one aims at 
combining a reasonable chance of winning with a relatively low price. The 
proposed methodology is more generally useful for all longitudinal studies 
where clustering of subjects is of interest and data are sparse and irregular. 

APPENDIX 

Proof of Proposition 2.1. The first two properties are obvious from 
the definition of D, while the third one can be easily proved by applying 
the Cauchy-Schwarz inequality. Note that D is a metric, thus satisfying the 
triangle inequality: D(i,j) < D(i,k) + D(k,j). Therefore, 

D 2 (i,j) < D 2 (i, k) + D 2 (k, j) + 2D(i, k)D(k,j). 

Since E(D 2 (i,j)\Y l ,Y j ,Y k ) = E(D 2 (i,j)\Y l ,Y j ) = D 2 (i,j), we then have 

D 2 (i,j)<D 2 (i,k)+D 2 (k,j) + 2E(D(i,k)D(k,j)\Y l ,Y J ,Y k ) 

and by the Cauchy-Schwarz inequality, 

E^fc^MlY^Y^Yfc) 

<{E( J D 2 (i,fc)|Y,,,Y J ,Y fc )} 1 /2 { E( jD 2 (A;)j) | Y . )Yj)Yfc)} i/2 

= D(i,k)D(j,k). 
This concludes the proof. □ 

Model fitting. Following Yao, Miiller and Wang (2005), we fit the model 
by local linear smoothers based on pooled data. For the mean curve /x(-), 
applying weighted least squares, one obtains 

(A.l) O , h) = arg min -2— {Y tJ T tJ )} 2 , 

*'^ 1 i=ij=i \ n fi / 

where K\ is a one-dimensional kernel, for example, the univariate Epanech- 
nikov kernel K\(x) = |(1 — x 2 )/[„i i] (x), and is the bandwidth which can 
be selected by leave-one-curve-out cross validation. The resulting estimate 
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is p,(t) = $o(t). The covariance kernel C(-,-) is fitted by two-dimensional 
weighted least squares, 

(A.2) =arg min V V kJ^^-,^^) 



i=l l<j^l<ni 



i2 



x {Gi(Tij,Tu) - f(P,s,t,Tij,Tu)y 

where Gi(Tij,Tu) = {Yij - A(^ij ) ) (XiJ ~K T u)), f(P,s,t,T ij ,T ll )=f3 + Ms- 
Tij) + 02(t — Tu), K% is a two-dimensional kernel, for example, the bivariate 
Epanechnikov kernel K 2 {x,y) = ^(1 — x 2 )(l — y 2 )/[_i i i](2;)/[_ lil ](y) and h G 
is a bandwidth which can be selected by leave-one-curve-out cross validation. 
Then C(s,t) = $o(s,t), and we note that since E(Gi(Tij,Tu)) « C(Tij,Tu) + 
(T 2 8ji, in (A.2) one should only use the off diagonal entries of the empirical 
covariance, that is, Gi(Tij,Tu), j ^ I. The function V(t) = C(t,t) + a 2 is 
fitted by 

(A.3) 

x {Gi(Tij,Tij) -fo-fait- Tij)} 2 , 

where K\ is the one-dimensional kernel, hy a bandwidth, and V(t) = 0o(t). 
Then the estimate of the error variance a 2 is given by 

(A.4) a 2 = lJ^(V(t)-d(t,t))dt, 

where T\ is the middle half of the interval T. 

We next state a series of auxiliary lemmas. The first lemma summarizes 
asymptotic results from Yao, Miiller and Wang (2005). The set of assump- 
tions (A1.1)-(A4) and (Bl.l)-(B2.2b) is given in Yao, Miiller and Wang 
(2005) and will not be repeated here. 

Lemma A.l [Theorem 1, Corollary 1 and Theorem 2 of Yao, Miiller and Wang 
(2005)]. Under (A1.1)-(A4) and (Bl.l)-(B2.2b) with u = 0,£ = 2in (B2.2a) 
and u=(0,0), 1 = 2 in (B2.2b), 

/ 1 \ p. ( 1 



sup |/2(t) -n(t) | =0 _— , sup \C(s,t)-C(8,t)\=O p .-=r TI , 

1/1 1 \\ t / 1 



fclli? = Op( p=7 o ), SUp|0fc(t) - = O p 



-Jnh A G ) teT \Vhh G 
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Lemma A. 2. Under the normality assumption and the set of assump- 
tions as in Lemma A.l, 

lim D^ K \i,j) = D^ K '(i,j) in probability. 



n— >+oo 



Proof. Recall that D^ K \i,j) is given by (2.8), which under the normal- 
ity assumption equals D^ K \i,j) with unknown model components replaced 
by their estimates. The result follows from Lemma A.l and Slutsky's theo- 
rem. □ 

Lemma A. 3. 

lim L)( K \i,j)=D(i,j) in probability. 
Proof. By definition, 

D 2 (iJ)-D( K \i,j) 2 = v( f; (^-e ifc ) 2 |Y 4 ,Y,) >0. 

\fc=_fs'+i / 

Thus, 

E(D 2 (i,j)-D( K \i,j) 2 )=E( f; (^k-Cjk) 2 ). 

\k=K+l ) 

Note that and ^ are independent with mean zero and variance A^, 
and therefore, 

(oo \ oo 

E te fc -Ci fc ) 2 =2 £ \ k . 
k=K+l / k=K+l 

Since J2k°=i Afc < oo, then 

lim V X k = 0. 

A— »00 

fc=if+l 

Therefore, by Markov's inequality and the fact that D 2 (i,j) — D^ K \i,j) 2 > 
0, for any e > 0, 

P(|£> 2 (i, j) - D^(i,j) 2 | > e) < E(D 2 (i, j) - flW(i,i) 2 )/e 

oo 

= 2 E A fc /e — 0. 

The result follows from Slutsky's theorem. □ 
Proof of Theorem 2.1. Note that 

\DW(ij) - D(i,j)\ < \D^ K \i,j) - D^ K \i,j)\ + \D^(i,j) - £(t,j)|. 
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By Lemma A. 3, for any e > and any 5 > 0, there exists Kq such that, for 
any K > K , 

P(\D( K \i,j)-D(i,j)\>e/2)<S/2. 

By Lemma A. 2, for each K > 0, there exists hq(K) > such that, for any 
n > n (K), 

P(\D( K \i,j)-D^(i,j)\>e/2)<5/2. 

Therefore, for K>K ,n> n (K), P(\D^ K \i,j) - D(i,j)\ >e)<5, which 
concludes the proof. □ 
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SUPPLEMENTARY MATERIAL 

Supplement A: eBay codes (DOI: 10.1214/08-AOAS172SUPPA; .txt). 

Supplement B: R functions used for FPCA and conditional distance anal- 
ysis (DOI: 10.1214/08-AOAS172SUPPB; .txt). These functions are used in 
eBay_codes.txt. 
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